DataLearner logoDataLearnerAI
Latest AI Insights
Model Leaderboards
Benchmarks
Model Directory
Model Comparison
Resource Center
Tools
LanguageEnglish
DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

Products

  • Leaderboards
  • Model comparison
  • Datasets

Resources

  • Tutorials
  • Editorial
  • Tool directory

Company

  • About
  • Privacy policy
  • Data methodology
  • Contact

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

Privacy policyTerms of service
  1. Home
  2. /
  3. Benchmarks
  4. /
  5. Tool Decathlon

Tool Decathlon

Tool Decathlon(简称 Toolathlon)是一个针对语言代理的基准测试框架,用于评估大模型在真实环境中使用工具执行复杂任务的能力。该基准涵盖32个软件应用和604个工具,包括日常工具如 Google Calendar 和 Notion,以及专业工具如 WooCommerce、Kubernetes 和 BigQuery。它包含108个任务,每个任务平均需要约20次工具交互。该框架于2025年10月发布,旨在填补现有评测在工具多样性和长序列执行方面的空白。通过执行式评估,该基准提供可靠的性能指

Updated Apr 25, 2026·765 views
Current SOTA
Moonshot AI
Kimi K2.6
Moonshot AI
50Score
Problem Count
108
Institution
个人
Category
AI Agent - 工具使用
Metrics
Accuracy
Language
英文
Difficulty
高难度

Overview

Tool Decathlon是一个用于评估大模型在真实环境中使用工具执行复杂任务的能力的评测基准

Related resources

  • View Paper
  • Get Dataset
  • Official Website
  • DataLearner Blog

Latest Tool Decathlon model rankings and full benchmark leaderboard

Browse the latest scores, model modes, release dates, and parameter sizes for Tool Decathlon.

Source: DataLearnerAI

Data sourced primarily from official releases (GitHub, Hugging Face, papers), then benchmark leaderboards, then third-party evaluators. Learn about our data methodology

Model Mode Legend

Tool Decathlon Rank

RankModelLicense
Moonshot AI
Kimi K2.6
Thinking EnabledTools
50.00
2026-04-201000BFree Commercial
OpenAI
GPT-5.4 mini
Thinking Level · Extra HighTools
42.90
2026-03-17UnknownClosed
智谱AI
GLM 5.1
Thinking EnabledTools
40.70
2026-03-2775.4BFree Commercial
4
阿里巴巴
Qwen 3.6 Plus Preview
Thinking EnabledTools
39.80
2026-03-31UnknownClosed
5
阿里巴巴
Qwen3.5-397B-A17B
Thinking EnabledTools
38.30
2026-02-1639.7BFree Commercial
6
OpenAI
GPT-5.4 nano
Thinking Level · Extra HighTools
35.50
2026-03-17UnknownClosed
7
阿里巴巴
Qwen3.6-35B-A3B
Thinking Enabled
26.90
2026-04-1635BFree Commercial